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ABSTRACT 


In this note we consider a simple reformulation of the tradi- 
tional power iteration algorithm for computing the station- 
ary distribution of a Markov chain. Rather than commu- 
nicate their current probability values to their neighbors at 
each step, nodes instead communicate only changes in prob- 
ability value. This reformulation enables a large degree of 
flexibility in the manner in which nodes update their values, 
leading to an array of optimizations and features, includ- 
ing faster convergence, efficient incremental updating, and 
a robust distributed implementation. 

While the spirit of many of these optimizations appear in 
previous literature, we observe several cases where this uni- 
fication simplifies previous work, removing technical compli- 
cations and extending their range of applicability. We imple- 
ment and measure the performance of several optimizations 
on a sizable (34M node) web subgraph, seeing significant 
composite performance gains, especially for the case of in- 
cremental recomputation after changes to the web graph. 


Categories and Subject Descriptors 


H.3.3 [Information Storage and Retrieval]: Information 
Search and Retrieval 


General Terms 


Algorithms, Experimentation, Performance, Theory 
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1. INTRODUCTION 


Motivated largely by the success and scale of Google’s 
PageRank ranking function, much research has emerged on 
efficiently computing the stationary distributions of web- 
scale Markov chains, the mathematical mechanism under- 
lying PageRank. The main challenge is that the web graph 
is so large that its edges typically only exist in external mem- 
ory and an explicit representation of its stationary distribu- 
tion just barely fits in to main memory. The time required 
to compute the stationary distribution is on the order of tens 
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of hours to days, and constant factor improvements in run- 
ning times can save substantial time and money. Even for 
the common researcher with interests in ranking research, 
computing and recomputing vectors of ranks is a time con- 
suming processes that greatly limits research throughput. 

As such, much work has been done on accelerating the 
performance of Powerlteration, the traditional approach to 
computing stationary distributions. These optimizations 
cover a spectrum of techniques, ranging from transforma- 
tions to the Markov chain that accelerate mixing to efficient 
heuristic updates that behave like Powerlteration to clever 
reuse of previously computed solutions. Most of these tech- 
niques are developed and evaluated in isolation, and it is 
unclear to what degree they can be effectively combined, 
both in terms of implementation and performance. 


1.1 Notation and Terminology 


Throughout this note we will frequently refer to vectors, 
matrices, and the scalars they comprise. For clarity, we 
consistently use lowercase letters (x) for vectors and capital 
letters (A) for matrices. For each, subscripted quantities 
(£u and Auv) are used to reference the scalar values at the 
indexed coordinates. 


1.2 PageRank and Powerlteration 


PageRank [2] is a system of scoring nodes in a directed 
graph based on the stationary distribution of a random walk 
on the directed graph. Conceptually, the score of a node 
corresponds to the frequency with which the node is visited 
as an individual strolls randomly through the graph. For 
technical reasons, the random walk is also encouraged to 
occasionally reset to a prespecified distribution, overcoming 
issues of weakly connected components in which a random 
surfer might get stuck and accelerating the rate at which a 
random walk approaches the stationary distribution. 

A random walk on n nodes can be described by a n x n 
matrix P, where entry Py. is the probability that from node 
u the walk next arrives at node v. Starting from a distri- 
bution x over the nodes (a is a vector of n entries that 
are non-negative and sum to one), after one step the dis- 
tribution becomes Px, and more generally after i steps the 
distribution becomes P‘x. 

We can decompose P into those transitions due to travers- 
ing a web link, and those transitions due to random reseting. 
Let the sparse matrix A have entries Avu equal to the prob- 
ability that from node u the walk traverses the link (u, v) to 
node v. Additionally, we define the vector r with each coor- 


dinate ru = 1 — DDA Avu, equal to the probability that the 
that the walk chooses not to follow an arc from node u and 
instead resets randomly to a node v chosen with probability 
proportional to dy. P can then be written as P = A + dr7, 
capturing both of the types of transitions. 


Powerlteration is the traditional manner of computing the 
stationary distribution of P, explicitly simulating the dis- 
semination of probability mass by repeatedly applying P 
to a supplied initial distribution x. Under modest assump- 
tions, e.g. that all entries of d and r are positive, for any 
initial distribution z, Px converges to a unique stationary 
distribution as 7 increases. 


Powerlteration(P, x) 
1. While (not converged) 
(a) Set x = Px 


While P itself is a dense matrix, every node can reset to 
any other node, we can efficiently compute Px by viewing it 
as Ar+dr7 a. We assume that A is stored on disk in a sparse 
format, perhaps as a list of (source, target, value) triples, 
though there are more compact representations. Ax is then 
computed using sparse matrix-vector multiplication: since 
(Az), = 30, Avu®u, we can populate the result vector by 
scanning the edge file, for each non-zero Avu adding Aputu 
to coordinate v of the result. We can produce the vector 
dr? x by determining r7 x in a pass over r and z, and scaling 
d appropriately before adding it to Az. 


For acceptable performance, we may only perform sequen- 
tial access to the edge file, which is too large to fit into main 
memory. Generally speaking, the number of non-zero entries 
in A is the limiting performance factor, both because of our 
need to scan over the edge file to read these entries, and also 
the random accesses to x each entry requires. The other op- 
erations, vector addition, scaling, and inner product, can all 
be done using sequential access to main memory. 


2. AN UPDATE-BASED ALGORITHM 


Oddly, we start our generalization of Powerlteration by 
restricting the problem we address. There are many vectors 
satisfying x = Px; any solution x can be multiplied by an 
arbitrary scalar value and still satisfy the equality. Typically 
we focus our attention on finding the vector x with ||z||1 = 
1. Let us instead focus on finding the vector x for which 


r? x = 1, and for which 


xe = Pr = Ar+dr'’x = Ar+d. 


As we will see in Theorem 1, if x = Ax +d, then x = Pz. 
Normalized, this vector is the stationary distribution of P. 

Consider an analog of Powerlteration in which we repeat- 
edly set x = Ax +d. As with Powerlteration, this iterative 
process will converge, and it converges to a vector satisfy- 
ing « = Ax +d. We can monitor convergence through the 
vector y = Ax — x + d: so long as y is non-zero, x has not 
yet converged. But, y also tells us the direction to update 
x; we advance to the next iterate of x by adding y, yielding 
Az+d. This first role is crucial, we must bring y to zero, but 
we needn’t be so rigid as to only ever add y to x. We might 
instead add other vectors to x that yield forward progress, 
maintaining y both as a convergence criteria, but also for 
guidance in choosing updates to x. 
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Consider an algorithm that monitors y = Ax — x + d, but 
is free to choose an arbitrary update vector z at any step, 
advancing from x to x + z. It is not hard to appropriately 
update y, as its new value satisfies 


A(a+z)—(a+z)+d 


For any update vector z, we can update y by passing z 
through the matrix A, adding the result to y, and subtract- 
ing z. Intuitively, z is extracted from y and propagated 
across the links in A, informing nodes of changes in their 
parent’s values and insisting that they now update in turn. 


ytAz—z. 


Operationally, this is exactly the algorithmic framework 
that we will consider. However, it will be useful not to fix 
y in terms of a particular A, x, and d, but rather let it be 
specified as an input parameter, properly determined before 
the method is invoked. 


Updatelteration (A, x,y) 
1. While (updates y remain): 
(a) Choose an update vector z. 
(b) Set r =x +z. 
(c) Set y = y + (Az — 2). 
While this framework is presently little more than a system 


of bookkeeping, we will solidify how one might choose z to 
shrink ||y||1, and which choices lead to efficient algorithms. 


We now state two theorems regarding the limit and rate of 
convergence of Updatelteration(A,z,y). The proofs, while 
short, are rote and unilluminating, and are defered to Ap- 
pendix A. We first argue that choosing y = Ax — x +d 
leads to a stationary vector of P = A+dr7, but also, in a 
rather oblique manner, describe where x ends up if we start 
Updatelteration(A, x,y) with an arbitrary y. 

THEOREM 1. For vectors x,y,d and substochastic matrix 


A, ify = Ac —x+d and d is a non-negative vector, then 
defining the stochastic matriz P = A+ dr?” /||d||1, 


[Pæ — a], < 2llylla leli > [alla — yll - 


and 


To reiterate, Theorem 1 not only describes the correct initial 
value of y to arrive at a stationary vector of P = A + dr7, 
but also says that for any A, x,y, if the vector d satisfying 
y = Ax — x + d is non-negative, then x arrives as the sta- 
tionary distribution of a random walk on A that resets to a 
distribution proportional to d. 


While the limit of x is well defined, choosing z arbitrarily 
clearly need not result in rapid, or any, convergence to this 
limit. Much as y leads z to its limit, vectors z whose coor- 
dinates agree with those of y also exhibit brisk convergence 
of ||y||1 to zero. 


THEOREM 2. If each zu lies between zero and yu, then 


lly — So rulzul - 
uU 


When all rọ, are equal, the exponential convergence of 
Powerlteration is a special case of Theorem 2: processing 
z = y each round reduces ||y||ı by a factor of 1 — ru. More- 
over, when r is not uniform Theorem 2 gives a tighter char- 
acterization of progress than eigenvalue bounds, which are 
generally in terms of the smallest r, value. Finally, and 
critically, Theorem 2 describes progress made when we pro- 
cess an update z Æ y, and informs us as to where in y the 
progress is being made. 


< 


ly + Az = zlļı 


3. ACCELERATION TECHNIQUES 


In this section we consider several manners of choosing 
the vector z in Updatelteration that give rise to various 
acceleration techniques. Most have occurred in some form 
previously in the literature, and we will discuss the often 
significant differences between their current and previous 
incarnations. In each case we will find shortcomings of pre- 
vious techniques that are resolved by casting them in our 
common framework. Additionally, a significant advantage 
is the simple manner in which the techniques now compose, 
both from an algorithmic and performance perspective. 

We also present experimental data detailing the perfor- 
mance of the acceleration techniques and compositions we 
discuss on a 34M page crawl from 2002 containing roughly 
800M edges. The pages are organized first by host, where 
hosts are sorted by crawl discovery order, and within each 
host by crawl discovery order. Several benefits of such an 
ordering are discussed in Kamvar et al. [6], who use a more 
thorough sorting within each host. We choose to order pages 
by hosts, independent of the significant performance gains 
noted in [6], because one of our optimizations relies on it, 
and we require a consistent experimental framework. 

For each approach, we plot the total error ||Px—<2||1/||z||1 
against computational effort, measured in units of 800M 
edges processed, corresponding to the effort required by a 
single pass of Powerlteration. The normalization by ||z||1 is 
required because our vector x need not remain at unit norm, 
and it would be unfair for a vector to achieve small ||Px—z|1 
simply by virtue of a small x. In reading the graphs, the ac- 
celeration can be seen by in the ratios of effort along a fixed 
(horizontal) level of error. 


3.1 Sequential Updates 


In choosing an update vector z, each coordinate makes 
a commitment to the update zu it intends, at which point 
each update is applied to x and propagated through A in 
parallel. However, in most implementations these updates 
will be processed serially, typically reading and propagating 
each zu in turn. As the zu may be chosen arbitrarily, there 
is no need for a node to commit to a particular value until 
it is needed. Rather, we can delay the choice of zu until 
it is needed, conceptually processing a long series of single 
coordinate updates of the form z = (0,...0, zu,0,...0). 

Sequential updating allows us to base zu on a value of yu 
that reflects all updates applied thus far, even those applied 
in the current iteration, allowing us to propagate the effect of 
a single update multiple times in a single pass over the edge 
file. Even if the nodes are ordered randomly, roughly half 
of the edges will point forward in the node order. Updates 
passed along these edges will be processed again before we 
complete a pass over the edge file. Well organized graphs 
can benefit even more, with updates pushed along entire 
acyclic subgraphs in one pass. 

Sequential updating is based on a specific ordering of 
nodes, and clearly some orderings are better than others. 
The ordering we use is based on crawl order, which has the 
peculiar property that 80% of the edge point backwards; 
crawling very quickly discovers pages with many incoming 
links, and placing them early reverses the direction of the 
bulk of their links. As any ordering can easily be run in 
both orders, forwards or backwards or both, we will also 
consider the sequential updating in reverse order. Exper- 
imentally, alternating direction, forwards then backwards, 
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performs more poorly than either unidirectional approach. 
This is peculiar, and merits further investigation. 

Figures 1 and 2 compare traditional Powerlteration (PI) 
against sequential updating (SU) and sequential updating 
applied in reverse order (R-SU). It should be stressed that 
these techniques exhibit exactly the same data access pat- 
terns as traditional Powerlteration, passing linearly over the 
edge file and probing u in main memory for each edge Auy. 
The approaches differ only in what they do for each Aw 
(and the direction of scan, for R-SU). Their running times 
are effectively identical to Powerlteration. 
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Figure 1: Sequential Updates: Total Error 


We see acceleration of nearly 2x and 3x for sequential up- 
dates on the crawl ordered and reverse crawl ordered graphs, 
respectively, with the gap between SU and R-SU diminish- 
ing with time. 
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Figure 2: Sequential Updates: Maximum Error 


Here we see again a lead of reverse crawl ordering, but the 
lead is less initially, growing after several iterations. 


Related Work: Sequential updating is similar to the Gauss- 
Seidel approach described by Arasu et al. [1], in which one 
sequentially sets £u = >>, Puv£v, using the most current 
values of x, rather than those of the previous iteration. In 
contrast with Updatelteration, the Gauss-Seidel approach 
requires the graph’s edges to be grouped by destination, 
rather than source, which can substantially complicate data 
maintenance. 


3.2 Reiterated Updates 


A large fraction of links in the web graph are intra-host, 
and as such it is common to group pages by host for local- 
ity benefits, discussed in [6]. Given such a grouping, after 
processing the nodes associated with a host, a large frac- 
tion of the propagated update z will return to nodes on that 
host. While this may seem frustrating at first, recall that 
Theorem 2 says that ||y||ı decreases by at least ru|zu|, in- 
dependent of where Az ends up. Moreover, various caches 
will retain the data used to process this group, making re- 
processing it very efficient. Rather than process the next 
group, reading sequential edge data from disk and probing 
Yv in main memory, we can reprocess the current group, 
reading sequential edge data from main memory and prob- 
ing y, in the L3 cache. The latter is substantially faster 
than the former, and represents a good payoff so long as 
substantial updates remain in the group. As the intra-host 
edge density is high, we might perform several iterations on 
a group before its y updates dissipate to other groups. 

Another popular grouping is by strongly connected com- 
ponent. Ordered topologically, there is no reason to advance 
from a component until it has satisfactorily converged, as 
there are no edges along which updates from subsequent 
components may return. This approach has the decided ad- 
vantage that the working set of edges and nodes at any point 
in time is only as large as the associated strongly connected 
component, each of which is visited only once. The main 
disadvantage is that computing strongly connected compo- 
nents is difficult in external memory, and an approximation 
should probably be used instead. Notice that we do not ac- 
tually require that the grouping have no back edges, but the 
fewer that exist, the fewer updates return upstream and the 
more effective each pass is. Strongly connected components 
ensure that one pass suffices, but groupings that simply have 
low reverse edge density are highly effective as well. 

There are other interesting groupings that one can imag- 
ine (we will discuss some more in Sections 3.3 and 3.4) and 
the question quickly emerges of which one should be used. 
In fact, we can use several. Our main constraint is that we 
access the edge file sequentially, and therefore we must col- 
locate edges from nodes in the same group. If we have the 
disk space to maintain multiple edge files, we can produce 
an edge file for each grouping, and choose to use a partic- 
ular edge file based on our needs at the time. In reading 
the edge file, we process the collocated nodes in a group, 
and can easily make multiple passes over this data without 
tripping over the intervening edges in the original edge file. 
This approach does not give us locality of reference in the y 
vector, as we have not actually changed node indices. 


Figures 3 and 4 examine the performance benefits of group- 
ing by host and processing each one, two, and three times. 
We also examine 10x reiteration, though only to demon- 
strate its limit. As we count operations instead of mea- 
sure execution time, we will need to make some assump- 
tions about the execution time of subsequent passes. For 
presentation reasons, we will assume that subsequent iter- 
ations are free, which is clearly false. However, the 2x and 
3x reiterations result in acceleration of nearly 2x and 3x, re- 
spectively, so acceleration clearly exists for more pessimistic 
assumptions. Actual runtimes suggest that subsequent iter- 
ations are cheap, with 2x and 3x reiteration taking roughly 
1.25 and 1.50 times as long, respectively, in a not especially 
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well controlled environment. Also, we only need to process 
the intra-group edges while reiterating, propagating updates 
along inter-group edges only once finished. 
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Figure 3: Reiteration: Total Error 
The acceleration for total error is almost 2x and 3x over 
SU, suggesting that reiterated updates can be nearly as ef- 


fective as multiple passes over the matrix. Of course, there 
are diminishing returns, visible as 3x and 10x converge. 
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Figure 4: Reiteration: Maximum Error 


Additional iterations help maximum error as well, but the 
diminishing returns are even more pronounced. 


Related Work: Arrangement by strongly connected com- 
ponents has appeared several times in various forms. Eiron 
et al. [4] note that pages with no outlinks form a large frac- 
tion of the web and describe how to infer their ranks from the 
stationary probabilities of a modified graph with these nodes 
removed. More generally, Arasu et al. [1] and Langville and 
Meyer [10] view the problem as a block upper triangular 
linear system, processing strongly connected components in 
turn. Their techniques focus on decomposing the Markov 
chain, and require a strict topological order. 

These approaches are captured by reiteration over the 
equivalent grouping. Moreover, Updatelteration can take 
advantage of groupings that are only mostly topological. 
This flexibility addresses concerns of the substantial effort 
needed for data organization and maintenance, and enables 
grouping by host/domain, which was not possible in the 
more rigid block triangular techniques. 


3.3 Selective Updates 


While zu = yu is clearly one effective choice, an alternate 
choice is z, = 0. In effect, we can choose not to update node 
u. Clearly if yu = 0 we need not expend effort to propagate 
Yu through A, as we will simply be adding zero to several 
locations in y. Even when yu is non-zero but small, we may 
want to defer the update until the gains are more in line 
with the typical entry. With this in mind, there are various 
predicates we could use to decide if we should process yu. 
We will specifically consider: 

Irv Yo! 
vl 


selecting entries with the highest anticipated progress |ruyu| 
per expended effort degu. There are other predicates that 
could be used, each resulting from a different view of which 
entries are important to process. Examples include choosing 
those entries with largest relative error |(Px — x)u|/|xu| or 
those entries whose range of possible ordinal ranks is largest. 

Selective updating has some interesting interaction with 
sequential and reiterated updates. As we run a pass of se- 
quential updates the average value of |rvyu|/degu will change, 
and while we could maintain the average exactly by carefully 
watching the changes in y, we can also do a more efficient 
approximation by assuming that ||y||ı decreases by exactly 
ru|Zu|. Reiteration is similar, in that each reiteration lowers 
the weight in a group markedly, by a factor of at least 1— ru, 
but not the average value over all of y. We could base our 
decisions on the group’s average, shrinking with the values 
we consider, or on the entire average over y. 


[ruYu! 


Effort : Set zu = yu iff 
degu 


> 


While the gains of selective updating in terms of compu- 
tation and memory accesses are clear, savings in terms of 
disk accesses are less so. It is not possible to skip entries 
on disk at no cost; data is read from disk in blocks, and the 
cost is amortized over all entries in the block. Likewise, disk 
prefetching will prepare subsequent blocks cheaply, and it is 
unclear that we gain anything by ignoring edge data passed 
to us. To address this somewhat, it is certainly possible to 
apply selective updating at a coarser scale than the node 
level. One could skip entire groups of entries at a time, per- 
mitting a volume of edges to be passed over at once and 
resulting entire disk blocks skipped. 

Alternately, Kamvar, Haveliwala, and Golub note in [5] 
that some pages converge more slowly than others, deter- 
mining which these are at runtime by observing their relative 
change in ranks each round. While they use converged val- 
ues to cull edges associated with converged nodes, we might 
base a grouping scheme (a la Section 3.2) on convergence 
rate, determined in a similar manner at run time. Emitting 
an appropriately grouped edge file can be done efficiently in 
a single pass so long as the number of groups is not terri- 
bly large. This organization of the edge file allows efficient 
passes over prefixes of the edge file, letting us efficiently 
process each group at an arbitrary rate. Understanding and 
experimenting with coarse-grained selective updating and 
grouping is interesting future research. 


Figures 5 and 6 examine the Effort predicate applied to 
previous schemes (denoted in the figures labels by “E+”). 
The acceleration we see here is substantial, as selective up- 
dating takes advantage of the initially high variability of 
magnitudes in y. We stress that actual acceleration will be 
less, although some may be recouped via clever grouping. 
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Figure 5: Selective Updating: Total Error 
We see substantial acceleration in terms of edges pro- 


cessed, which is, admittedly, a somewhat suspect measure. 
The message is that the work that needs to be done is less. 
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Figure 6: Selective Updating: Maximum Error 


Initial acceleration is especially pronounced here, as the 
selective updates immediately leap two orders of magnitude. 


Related Work: Selective updating can be seen in the work 
of Kamvar, Haveliwala, and Golub [5], who describe a power 
iteration process wherein entries £u that appear to have con- 
verged are frozen, saving them the recomputation of these 
entries every iteration. Moreover, edges associated with the 
frozen nodes are trimmed from the edge file, reducing disk 
IO required. Freezing is analogous to the non-transmission 
of an update, and the edge trimming is clearly the basis of 
the grouping discussed previously, though more final. 

The main difference between [5] and Updatelteration is 
that in the former the recipient decides whether an update 
will be propagated, and it is conceivable that significant up- 
dates may be ignored. This is particularly evident during 
incremental changes to the web graph. As the matrix A 
changes over time, previous stationary distributions prove 
good starting points for converging to the new stationary 
distribution. But if only a few links change, entries of £u 
not incident to a changed edge will remain stationary after 
an iteration and will be frozen and never updated. 


3.4 Incremental Updates 


Over time the adjacency structure of the web changes, 
and we will want to compute the stationary distribution of 
a new chain that differs from the old in a relatively small 
number of locations. The stationary distribution of the old 
chain is generally viewed as a good first approximation, and 
indeed it is easy and intelligent to restart Powerlteration on 
a new chain using the old x. 

Restarting Updatelteration from a specific x appears non- 
trivial, as we must compute y = Ax — qx + d, involving a ma- 
trix multiplication. In fact, the process is much simpler: Let 
A and B describe the old and new edge transition matrices, 
and consider the two associated update vectors ya and ys, 


ya = Ax—-xr+d and yg = Br-r+d. 


We can relate the update vector yg to its antecedent ya as 
YB yat(B-A)zx. 


This equivalence shows how to efficiently update ya to yB, 
allowing us to efficiently reinvoke Updatelteration(B, x, yz). 
The effort required in this matrix-vector multiplication is 
proportional to the number of non-zero entries in B — A, 
corresponding to the number of changed edge weights. 


Several approaches to personalization of PageRank are 
based on personalization of the reset distribution [8, 9], shift- 
ing influence to those sites that the distribution favors, and 
the site linked by them. Recall from Theorem 1 that x 
converges to the stationary distribution of the chain with 
reset distribution d = y — (Ax — x). Personalization of the 
reset distribution is easily performed by incorporating any 
changes to d into y instead, changing x’s limit appropri- 
ately. It is worth stressing that x’s limit is defined by the 
distribution proportional to d, and we need not worry about 
renormalizing d if we only make a few changes. 


Finally, much of research into Markov chain based rank- 
ing research is exploratory: the best setting of weights in A 
and vectors d and r are not known. Uniform weights seem 
natural as defaults, but are clearly primitive choices. Ex- 
ploring link weighting schemes based on content analysis or 
resetting policies based on content quality require efficient 
recomputation of ranks. Each of these explorative choices: 
updating Avu, ru, and du values, are easy in Updateltera- 
tion, corresponding to simple updates to y. 


In these three cases above, the changes to the Markov 
chain often result in sparse updates to y: most of the edges 
in the graph are stable between recrawls, and much of per- 
sonalization of reset distributions is localized (upweighting 
a few trusted/bookmarked pages, for example). In this con- 
text, selective updating of Section 3.3 is well suited to effi- 
ciently process just those substantial entries, and leave the 
converged regions of the graph untouched. Of course to ac- 
commodate this properly, it makes sense to maintain an edge 
file of those parts of the graph that experience frequent edge 
churn, so that we needn’t pass over the entire graph. 

The fine granularity of sequential updates also allows a 
very smooth incremental update: we can decompose any 
update to the adjacency matrix into a set of small updates 
to the links of each node, which we apply as we visit each 
node. We need not pause the system to compute (B — A), 
but can apply the implications of changes at each node in 
turn. This becomes all the more relevant in a distributed 
setting where such pauses could destroy parallelism. 
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Figure 7 compares various techniques applied to a con- 
verged vector y that has had 1000 random positions up- 
dated randomly by +1/n, emulating either a change in the 
link structure or reset distribution. For small initial ||y]]1, 
the scale of the updates does not affect the shape of the 
curves, so the choice 1/n is arbitrary. 


1e-03 -—————[—7T—T 7-1 
F Pl —— 
t SU ---x--- 
18-04. F E+SU =x- J 
8 


1e-05 


1e-06 


Total Error 


1e-07 F 
1e-08 L 


1e-09 f 


fi 1 1 
4 6 8 
800M Edges Processed 


te-10 L 
0 


Figure 7: Incremental Updating 


It is difficult to characterize the acceleration of the incre- 
mental updates by a multiplicative factor, as it is clearly 
a different shape than the standard curves. Several orders 
of magnitude are gained immediately, with the slope arriv- 
ing at the shape of Figure 5 as the initially concentrated y 
vector is distributed more uniformly. 
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Figure 8: Incremental Updating 


Maximum error exhibits the same behavior as total error, 
dropping rapidly as the initially sparse vector is dispersed. 
The initial hiccup again reflects the sensitive nature of the 
maximum error measure. 


Related Work: Chien et al. [3] describe an approach to 
incremental updating that is based on the construction and 
analysis of a new a Markov chain on nodes within a mod- 
est neighborhood of the graph changes, and a supernode 
representing nodes outside this horizon. Their approach is 
similar in spirit to ours, in that attention is restricted to 
the relatively small region where change may occur. How- 
ever, rather than fix a region and degree of accuracy, Up- 
datelteration discovers where updates are needed as it goes, 
accommodating any degree of accuracy fluidly. 


Haveliwala [8] and Jeh and Widom [9] have done work on 
efficient personalization, observing that the function map- 
ping reset distributions to stationary distributions is linear. 
This enables very efficient manners of synthesizing personal- 
ized PageRanks from a set of precomputed PageRanks based 
on various reset distributions. For example, a page’s du 
value can be increased by folding in the stationary distribu- 
tion of a random walk that resets to only that page, exactly 
analogous to increasing and propagating yu. 


3.5 Floating Point Implications 


Our ability to choose z arbitrarily has implications for 
floating point error. We have the flexibility to always choose 
Zu to be a power of two, so that its addition to x will result 
in nominal floating point loss. This is harder to guarantee 
with y, as transmission along weighted edges will change zu 
from a power of two. Understanding and improving floating 
point behavior has positive implications for the introduction 
of strength reduction and low precision arithmetic, of par- 
ticular interest in this setting where maintaining all of x or 
y in memory is challenging. Additionally, Updatelteration 
propagates and combines updates Zu, which are typically of 
smaller magnitude than the x, values that Powerlteration 
operates with, and the precision maintained is thus higher. 


3.6 Distribution and Robustness 


If we remove the sequential behavior from sequential up- 
dates, we see that updates in Updatelteration can actually 
be totally asynchronous. Moreover, our choices for zu are 
made locally with only a modicum of global information. 
This allows for a very smooth distributed implementation, in 
which the only coordination between compute nodes that is 
required is eventual communication of the updates applied. 
We can delay and reorder inter-node z transmissions until 
the updates are significant, batching and trimming network 
overhead. Clearly, the best update schedule is highly de- 
pendent on the system topology, and we refrain from giving 
explicit suggestions here. 

In an extreme case of delay, a compute node may be un- 
available for a long period of time or even crash. Other 
compute nodes can continue in its absence, functioning un- 
der the belief that the PageRanks associated with that com- 
pute node simply have not changed. If the node comes on- 
line again it simply reenters the computation, transmitting 
and receiving updates. As noted for incremental updates, 
the granularity of sequential updating is very fine, and the 
amount of work needed to roll forward from any checkpoint 
can be arbitrarily small. 


3.7 Decentralization 


The Markov chain we have studied simulates the propaga- 
tion of probability mass through a directed communication 
network whose nodes happen to be computational agents. 
The propagation of updates is easily performed within the 
communication network, as updates are only transmitted 
along links. The initial values of x = 0 and y = d are easily 
chosen, as d needn’t be normalized. As the network changes, 
in the incremental fashion suggested by Section 3.4, the nec- 
essary updates to y are computable by the source of the edge 
that has arrived or departed. Gracefully departing nodes re- 
moving their incoming edges using this update mechanism 
and apply the update zu = yu — du before departing. 


4. CONCLUSIONS AND FUTURE WORK 


We have examined an algorithmic reformulation of the 
traditional power iteration algorithm based on the propa- 
gation of updates rather than values. Updatelteration en- 
ables several algorithmic optimizations that result in more 
efficient convergence. Moreover, the optimizations are well 
suited to the problems of incremental and personalized up- 
dates to the underlying Markov chain, and permit flexible 
operation in a distributed setting. 

The optimizations presented here are likely just a sam- 
pling of what can be done to accelerate computation of 
PageRank. These optimizations are intended to take ad- 
vantage of particular features of computer systems, and it 
seems likely that other features may yet be exploited, both 
for performance and potentially quality of ranking. Tech- 
niques such as Arnoldi Iteration and unsymmetric Lanc- 
zos are tempting targets, as is the power extrapolation ap- 
proach of Kamvar et al. [7]. Additionally, there is work to 
be node exploring the new possibilities enabled through ef- 
ficient PageRank computation. 
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7. APPENDIX A: PROOFS 


We now look at the two deferred proofs from Section 2. 
Recall that Theorem 1 requires the entries of d be non- 
negative. 


PROOF OF THEOREM 1. Px—a and y = Arx — qx +d differ 
only in the amount of d added to Ax — x. We can thus write 
their difference as 


y—(Px—2) = d-—dr'z/|\d|i. 


(1) 


Summing the coordinates of vectors on both sides of (1), and 
noting that }>,,(Px)u = do, £u and >, du = ||dl|1, gives 


doy = ldh-rz. (2) 


To prove the first stated inequality, we move y to the right 

hand side of (1), take norms, and use the triangle inequality. 
T 

| Pæ — alla lyla + (lidl — r x)|- (3) 

Substituting X` „ yu for ||d||1—r7z and then | >, yul < |lylla, 


lula +1 yl < livia - (4) 


< 


< 


||P — alla 
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Similarly, the second stated inequality results from the 
inequalities 


lel > re = lidh -$ yu > lidh = luli- (5) 


o 


with the inequality ||x||ı > rTz following as all |[ru| < 1. 


The proof of Theorem 2 relies on the assumption that the 
coordinates of z lie between zero and the corresponding yu. 


PROOF OF THEOREM 2. As each zu lies between zero and 
Yu, we have that ||y — zll = Ilyllı — ||z||ı, and starting from 
the triangle inequality 


A 


lly + Az = zlļa ly — zll + [lAzll (6) 


lyla = Wella + || Az (7) 


Column u of A sums to ru, and thus ||Az|]1 < Xp (1—ru)|zul. 
lyla — Do Izul + SOC — ru)lzul (8) 


Collecting the summands yields the claimed bound. O 


< 


ly + Az — 21 


